Part 2.3: Data Visualization and Impact Analysis¶
Modular Functions for Impact Analysis & Visualization
This section provides reusable, parameterized functions for analyzing and visualizing performance metrics across temporal and categorical dimensions. Designed for flexibility and clarity, the functions support:
Dynamic grouping by time (year, month_name, day_name) or category (store, promo, etc.)
Preprocessing filters to exclude non-operational records (e.g., closed stores, zero-sales days)
Statistical summaries including mean, standard deviation, and count
Ranked insights with volatility and performance differentials
Interactive visualizations via Plotly for enhanced interpretability
These tools enable scalable impact assessments and trend analyses across diverse datasets with minimal code repetition.
1. Setup & Imports Libraries¶
import time
# Step 1: Setup & Imports Libraries
print("Step 1: Setup and Import Libraries started...")
time.sleep(1) # Simulate processing time
Step 1: Setup and Import Libraries started...
# Data Manipulation & Processing
import math
import numpy as np
import pandas as pd
import scipy.stats as stats
from datetime import datetime
from sklearn.preprocessing import *
# Data Visualization
import seaborn as sbn
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from pandas.plotting import scatter_matrix
# to ensure Plotly works in both Jupyter and HTML export
pio.renderers.default = "notebook+plotly_mimetype"
sbn.set(rc={'figure.figsize':(14,6)})
plt.style.use('seaborn-v0_8')
sbn.set_palette("husl")
# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.float_format','{:.2f}'.format)
# Warnings
import warnings
warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')
print("="*60)
print("Rossman Store Sales Time Series Analysis - Part 2")
print("="*60)
print("All libraries imported successfully!")
print("Analysis Date:", pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'))
============================================================ Rossman Store Sales Time Series Analysis - Part 2 ============================================================ All libraries imported successfully! Analysis Date: 2025-08-13 21:22:59
print("✅ Setup and Import Liraries completed.\n")
✅ Setup and Import Liraries completed.
# Start Impact Analysis
viz_impact_analysis_begin = pd.Timestamp.now()
bold_start = '\033[1m'
bold_end = '\033[0m'
print("🔍 Viz impact Analysis Started ...")
print(f"🟢 Begin Date: {bold_start}{viz_impact_analysis_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}\n")
🔍 Viz impact Analysis Started ...
🟢 Begin Date: 2025-08-13 21:23:00
Restore the file¶
%store -r df_viz_feat
View or Display Dataset¶
print("\nTrain Data Preview:")
print("\n",df_viz_feat.head())
Train Data Preview:
store dayofweek date sales customers open promo stateholiday schoolholiday day week month quarter year isweekend isholiday isschoolDay
982643 1115 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
982640 1112 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
982639 1111 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
982638 1110 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
982637 1109 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
# Step 2: Data Ingestion
print("Step 2: Features Engineering started...")
time.sleep(1) # Simulate processing time
Step 2: Features Engineering started...
# Make a copy of the original dataframe to avoid modifying it
df_features = df_viz_feat.copy()
Promotion Impact Analysis¶
def analyze_promotion_impact(df, target_col='sales', category_col='day', width=1200, height=500):
"""
Analyze promotion impact across any category (day, month, store, etc.)
Parameters:
- df: DataFrame with 'promo' column
- target_col: metric to analyze ('sales' or 'customers')
- category_col: grouping category ('day', 'month', 'store', etc.)
"""
# Filter out closed stores (e.g., sales = 0) for unbiased and fair comparison
df = df[df[target_col] > 0].copy()
# Create summary data
summary = df.groupby([category_col, 'promo'])[target_col].mean().reset_index()
# Visualization
fig = px.bar(
summary,
x=category_col,
y=target_col,
color='promo',
title=f'Promotion Impact: {target_col.title()} by {category_col.title()}',
color_discrete_map={'Promo': '#636EFA', 'No Promo': '#EF553B'},
barmode='group'
)
fig.update_layout(title_x=0.5, height=height, width=width)
fig.show(config={'displayModeBar': True, 'displaylogo': False})
# Impact analysis
print(f"Promotion Impact Analysis - {target_col.title()} by {category_col.title()}:")
print("=" * 60)
categories = sorted(df[category_col].unique())
for category in categories:
no_promo = df[(df[category_col] == category) & (df['promo'] == 'No Promo')][target_col].mean()
promo = df[(df[category_col] == category) & (df['promo'] == 'Promo')][target_col].mean()
if not pd.isna(no_promo) and not pd.isna(promo):
lift = ((promo - no_promo) / no_promo) * 100
currency = "€" if target_col == 'sales' else ""
print(f"{str(category):12}: No Promo {currency}{no_promo:6,.0f} | Promo {currency}{promo:6,.0f} | Lift {lift:+5.1f}%")
# Overall summary
overall_no_promo = df[df['promo'] == 'No Promo'][target_col].mean()
overall_promo = df[df['promo'] == 'Promo'][target_col].mean()
overall_lift = ((overall_promo - overall_no_promo) / overall_no_promo) * 100
currency = "€" if target_col == 'sales' else ""
print(f"\nOverall Impact:")
print(f"Average lift from promotions: {overall_lift:+.1f}%")
print(f"Additional revenue per day: {currency}{overall_promo - overall_no_promo:,.0f}")
return summary
Top performer Impact Analysis¶
def analyze_top_performers(df, group_col='store', target_col='sales', top_n=10, width=1200, height=500):
"""
Analyze top performing entities (stores, days, months, etc.)
Parameters:
- df: DataFrame
- group_col: column to group by ('store', 'day', 'month', etc.)
- target_col: metric to analyze ('sales' or 'customers')
- top_n: number of top performers to show
"""
# Filter out closed stores (e.g., sales = 0) for unbiased and fair comparison
df = df[df[target_col] > 0].copy()
# Calculate averages and get top performers
group_avg = df.groupby(group_col)[target_col].mean()
top_performers = group_avg.nlargest(top_n)
# Visualization
fig = px.bar(
x=top_performers.index.astype(str),
y=top_performers.values,
title=f'Top {top_n} {group_col.title()} by Average {target_col.title()}',
labels={'x': group_col.title(), 'y': f'Average {target_col.title()}'}
)
fig.update_layout(title_x=0.5, height=height, width=width)
fig.show(config={'displayModeBar': True, 'displaylogo': False})
# Performance analysis
print(f"Top {top_n} {group_col.title()} Performance Analysis:")
print("=" * 55)
print(f"{'Rank':<4} {group_col.title():<10} {'Average':<15} {'% of #1':<10}")
print("-" * 55)
for i, (entity, avg_value) in enumerate(top_performers.items(), 1):
pct_of_top = (avg_value / top_performers.iloc[0]) * 100
currency = "€" if target_col == 'sales' else ""
print(f"{i:<4} {str(entity):<10} {currency}{avg_value:>9,.0f} {pct_of_top:>6.1f}%")
# Summary statistics
total_entities = len(group_avg)
top_avg = top_performers.mean()
overall_avg = group_avg.mean()
performance_gap = ((top_avg - overall_avg) / overall_avg) * 100
print(f"\nSummary Statistics:")
print(f"Total {group_col}s analyzed: {total_entities:,}")
print(f"Top {top_n} average: {currency}{top_avg:,.0f}")
print(f"Overall average: {currency}{overall_avg:,.0f}")
print(f"Top {top_n} outperform by: {performance_gap:.1f}%")
return top_performers
Temporal Trends Impact Analysis¶
def analyze_temporal_trends(df, time_col='month', target_col='sales', width=1200, height=500):
"""
Analyze trends over time periods (month_name, day_name, year, etc.)
Parameters:
- df: DataFrame
- time_col: time dimension ('month_name', 'day_name', 'year', etc.)
- target_col: metric to analyze ('sales' or 'customers')
"""
# Filter out closed stores (e.g., sales = 0) for unbiased and fair comparison
df = df[df[target_col] > 0].copy()
# Calculate averages by time period
time_stats = df.groupby(time_col).agg({
target_col: ['mean', 'std', 'count']
}).round(0)
time_stats.columns = ['avg', 'std', 'count']
time_stats = time_stats.reset_index().sort_values('avg', ascending=False)
# Visualization
fig = px.bar(
time_stats,
x=time_col,
y='avg',
title=f'{target_col.title()} Performance by {time_col.title()}',
color='avg',
color_continuous_scale='viridis'
)
fig.update_layout(title_x=0.5, height=height, width=width)
fig.show(config={'displayModeBar': True, 'displaylogo': False})
# Trend analysis
print(f"{time_col.title()} Performance Analysis:")
print("=" * 50)
print(f"{'Rank':<4} {time_col.title():<12} {'Average':<15} {'Std Dev':<10} {'Count':<8}")
print("-" * 60)
for i, row in time_stats.iterrows():
currency = "€" if target_col == 'sales' else ""
print(f"{i+1:<4} {str(row[time_col]):<12} {currency}{row['avg']:>9,.0f} {currency}{row['std']:>6,.0f} {row['count']:>6,.0f}")
# Key insights
best_period = time_stats.iloc[0][time_col]
worst_period = time_stats.iloc[-1][time_col]
best_value = time_stats.iloc[0]['avg']
worst_value = time_stats.iloc[-1]['avg']
volatility = ((best_value - worst_value) / time_stats['avg'].mean()) * 100
currency = "€" if target_col == 'sales' else ""
print(f"\nKey Insights:")
print(f"Best {time_col}: {best_period} ({currency}{best_value:,.0f})")
print(f"Worst {time_col}: {worst_period} ({currency}{worst_value:,.0f})")
print(f"Performance range: {currency}{best_value - worst_value:,.0f}")
print(f"Volatility: {volatility:.1f}%")
return time_stats
State Holiday Impact Analysis¶
def analyze_stateholiday_impact(df, target_col ='sales', category_col ='day', width =1200, height =500):
"""
Analyze the impact of state holidays across a specified category (e.g., day, month, store).
Parameters:
- df: DataFrame with 'stateholiday' column already mapped to labels
- target_col: metric to analyze ('sales' or 'customers')
- category_col: grouping category ('day', 'month', 'store', etc.)
"""
# Filter out closed stores (e.g., sales = 0) for unbiased and fair comparison
df_open = df[df[target_col] > 0].copy()
# Create summary data
summary = df_open.groupby([category_col, 'stateholiday'])[target_col].mean().reset_index()
# Visualization
fig = px.bar(
summary,
x =category_col,
y =target_col,
color ='stateholiday',
title =f'State Holiday Impact: {target_col.title()} by {category_col.title()}',
barmode ='group'
)
fig.update_layout(title_x =0.5, height =height, width =width)
fig.show(config={'displayModeBar': True, 'displaylogo': False})
# Impact analysis
print(f"State Holiday Impact Analysis - {target_col.title()} by {category_col.title()}:")
print("=" * 70)
# Get regular day baseline
regular_avg = df_open[df_open['stateholiday'] == 'Normal Day'][target_col].mean()
# Overall holiday impact
print("Overall Holiday Impact:")
print("-" * 25)
for holiday_label in ['Public', 'Easter', 'Christmas']:
if holiday_label in df['stateholiday'].values:
holiday_avg = df_open[df_open['stateholiday'] == holiday_label][target_col].mean()
if not pd.isna(holiday_avg):
impact = ((holiday_avg - regular_avg) / regular_avg) * 100
currency = "€" if target_col == 'sales' else ""
print(f"{holiday_label:15}: {currency}{holiday_avg:6,.0f} ({impact:+5.1f}% vs regular)")
print(f"Regular Days: €{regular_avg:6,.0f} (baseline)")
# Category-wise analysis
print(f"\nHoliday Impact by {category_col.title()}:")
print("-" * 35)
categories = sorted(df[category_col].unique())
for category in categories:
regular = df_open[(df_open[category_col] == category) & (df_open['stateholiday'] == 'Normal Day')][target_col].mean()
holiday = df_open[(df_open[category_col] == category) & (df_open['stateholiday'] != 'Normal Day')][target_col].mean()
if not pd.isna(regular):
currency = "€" if target_col == 'sales' else ""
if not pd.isna(holiday):
impact = ((holiday - regular) / regular) * 100
print(f"{str(category):12}: Regular {currency}{regular:6,.0f} | Holiday {currency}{holiday:6,.0f} | Impact {impact:+5.1f}%")
else:
print(f"{str(category):12}: Regular {currency}{regular:6,.0f} | No holiday data")
# Store closure analysis
total_records = len(df)
closed_stores = len(df[df[target_col] == 0])
holiday_closures = len(df[(df['stateholiday'] != 'Normal Day') & (df[target_col] == 0)])
regular_closures = len(df[(df['stateholiday'] == 'Normal Day') & (df[target_col] == 0)])
print(f"\nStore Operations Impact:")
print("-" * 25)
print(f"Total store closures: {closed_stores:,} ({(closed_stores/total_records)*100:.1f}%)")
print(f"Holiday closures: {holiday_closures:,}")
print(f"Regular closures: {regular_closures:,}")
return summary
RUNNING TOP 4 IMPORTANT ANALYSIS¶
# Example usage:
if __name__ == "__main__":
# Assuming df_features is your dataset
print("=== RUNNING TOP 4 MOST IMPORTANT ANALYSES ===\n")
# 1. Promotion Impact Analysis
print("1. PROMOTION IMPACT ANALYSIS")
print("-" * 40)
promo_results = analyze_promotion_impact(df_features, target_col='sales', category_col='day')
print("\n" + "="*80 + "\n")
# 2. State Holiday Impact Analysis
print("2. STATE HOLIDAY IMPACT ANALYSIS")
print("-" * 40)
holiday_results = analyze_stateholiday_impact(df_features, target_col='sales', category_col='day')
print("\n" + "="*80 + "\n")
# 3. Top Store Performance Analysis
print("3. TOP STORE PERFORMANCE ANALYSIS")
print("-" * 40)
store_results = analyze_top_performers(df_features, group_col='store', target_col='sales', top_n=10)
print("\n" + "="*80 + "\n")
# 4. Temporal Trends Analysis
print("4. TEMPORAL TRENDS ANALYSIS")
print("-" * 40)
trend_results = analyze_temporal_trends(df_features, time_col='month', target_col='sales')
=== RUNNING TOP 4 MOST IMPORTANT ANALYSES === 1. PROMOTION IMPACT ANALYSIS ----------------------------------------
Promotion Impact Analysis - Sales by Day: ============================================================ Fri : No Promo € 6,352 | Promo € 7,734 | Lift +21.8% Mon : No Promo € 6,205 | Promo € 9,776 | Lift +57.5% Thu : No Promo € 5,737 | Promo € 7,595 | Lift +32.4% Tue : No Promo € 5,716 | Promo € 8,277 | Lift +44.8% Wed : No Promo € 5,610 | Promo € 7,678 | Lift +36.9% Overall Impact: Average lift from promotions: +38.8% Additional revenue per day: €2,302 ================================================================================ 2. STATE HOLIDAY IMPACT ANALYSIS ----------------------------------------
State Holiday Impact Analysis - Sales by Day: ====================================================================== Overall Holiday Impact: ------------------------- Public : € 8,487 (+22.1% vs regular) Easter : € 9,888 (+42.3% vs regular) Christmas : € 9,744 (+40.2% vs regular) Regular Days: € 6,951 (baseline) Holiday Impact by Day: ----------------------------------- Fri : Regular € 7,062 | Holiday €11,190 | Impact +58.5% Mon : Regular € 8,217 | Holiday € 8,191 | Impact -0.3% Sat : Regular € 5,885 | Holiday €15,715 | Impact +167.1% Sun : Regular € 8,206 | Holiday € 4,977 | Impact -39.4% Thu : Regular € 6,750 | Holiday € 8,438 | Impact +25.0% Tue : Regular € 7,091 | Holiday € 6,053 | Impact -14.6% Wed : Regular € 6,715 | Holiday € 8,711 | Impact +29.7% Store Operations Impact: ------------------------- Total store closures: 168,494 (17.1%) Holiday closures: 30,140 Regular closures: 138,354 ================================================================================ 3. TOP STORE PERFORMANCE ANALYSIS ----------------------------------------
Top 10 Store Performance Analysis: ======================================================= Rank Store Average % of #1 ------------------------------------------------------- 1 817 € 21,789 100.0% 2 262 € 20,684 94.9% 3 1114 € 20,576 94.4% 4 251 € 19,158 87.9% 5 842 € 18,472 84.8% 6 513 € 18,212 83.6% 7 562 € 17,985 82.5% 8 788 € 17,968 82.5% 9 383 € 17,293 79.4% 10 756 € 16,575 76.1% Summary Statistics: Total stores analyzed: 1,115 Top 10 average: €18,871 Overall average: €6,931 Top 10 outperform by: 172.3% ================================================================================ 4. TEMPORAL TRENDS ANALYSIS ----------------------------------------
Month Performance Analysis: ================================================== Rank Month Average Std Dev Count ------------------------------------------------------------ 3 Dec € 8,609 € 4,094 50,393 10 Nov € 7,189 € 3,010 51,401 9 May € 7,107 € 3,008 80,099 1 Apr € 7,047 € 3,138 81,726 7 Jun € 7,001 € 3,145 82,571 8 Mar € 6,977 € 3,124 85,975 6 Jul € 6,910 € 3,038 55,388 2 Aug € 6,649 € 2,902 54,411 11 Oct € 6,603 € 2,815 53,291 4 Feb € 6,589 € 2,871 80,239 5 Jan € 6,564 € 2,849 86,335 12 Sep € 6,547 € 2,889 52,321 Key Insights: Best month: Dec (€8,609) Worst month: Sep (€6,547) Performance range: €2,062 Volatility: 29.5%
Top 5 performing days¶
# Top 5 performing days
analyze_top_performers(df_features, 'day', 'sales', 5)
Top 5 Day Performance Analysis: ======================================================= Rank Day Average % of #1 ------------------------------------------------------- 1 Mon € 8,217 100.0% 2 Sun € 8,205 99.8% 3 Tue € 7,091 86.3% 4 Fri € 7,066 86.0% 5 Thu € 6,756 82.2% Summary Statistics: Total days analyzed: 7 Top 5 average: €7,467 Overall average: €7,134 Top 5 outperform by: 4.7%
day Mon 8217.44 Sun 8204.63 Tue 7090.99 Fri 7066.37 Thu 6756.03 Name: sales, dtype: float64
Yearly sales trends¶
# Yearly sales trends
analyze_temporal_trends(df_features, 'year', 'sales')
Year Performance Analysis: ================================================== Rank Year Average Std Dev Count ------------------------------------------------------------ 3 2015.0 € 7,098 € 3,051 165,841 2 2014.0 € 7,026 € 3,129 310,385 1 2013.0 € 6,815 € 3,115 337,924 Key Insights: Best year: 2015.0 (€7,098) Worst year: 2013.0 (€6,815) Performance range: €283 Volatility: 4.1%
| year | avg | std | count | |
|---|---|---|---|---|
| 2 | 2015 | 7098.00 | 3051.00 | 165841 |
| 1 | 2014 | 7026.00 | 3129.00 | 310385 |
| 0 | 2013 | 6815.00 | 3115.00 | 337924 |
Promotion impact on customers by month¶
# Promotion impact on customers by month
analyze_promotion_impact(df_features, 'customers', 'month')
Promotion Impact Analysis - Customers by Month: ============================================================ Apr : No Promo 698 | Promo 863 | Lift +23.7% Aug : No Promo 684 | Promo 838 | Lift +22.6% Dec : No Promo 810 | Promo 998 | Lift +23.2% Feb : No Promo 679 | Promo 805 | Lift +18.5% Jan : No Promo 661 | Promo 799 | Lift +20.9% Jul : No Promo 675 | Promo 855 | Lift +26.7% Jun : No Promo 685 | Promo 858 | Lift +25.3% Mar : No Promo 681 | Promo 845 | Lift +24.1% May : No Promo 728 | Promo 842 | Lift +15.6% Nov : No Promo 723 | Promo 845 | Lift +17.0% Oct : No Promo 705 | Promo 819 | Lift +16.2% Sep : No Promo 679 | Promo 835 | Lift +22.9% Overall Impact: Average lift from promotions: +21.3% Additional revenue per day: 148
| month | promo | customers | |
|---|---|---|---|
| 0 | Apr | No Promo | 697.81 |
| 1 | Apr | Promo | 862.93 |
| 2 | Aug | No Promo | 683.70 |
| 3 | Aug | Promo | 838.39 |
| 4 | Dec | No Promo | 810.09 |
| 5 | Dec | Promo | 998.24 |
| 6 | Feb | No Promo | 679.25 |
| 7 | Feb | Promo | 805.11 |
| 8 | Jan | No Promo | 661.39 |
| 9 | Jan | Promo | 799.34 |
| 10 | Jul | No Promo | 675.14 |
| 11 | Jul | Promo | 855.10 |
| 12 | Jun | No Promo | 685.07 |
| 13 | Jun | Promo | 858.47 |
| 14 | Mar | No Promo | 680.72 |
| 15 | Mar | Promo | 844.62 |
| 16 | May | No Promo | 728.14 |
| 17 | May | Promo | 841.68 |
| 18 | Nov | No Promo | 722.73 |
| 19 | Nov | Promo | 845.48 |
| 20 | Oct | No Promo | 704.59 |
| 21 | Oct | Promo | 818.75 |
| 22 | Sep | No Promo | 679.46 |
| 23 | Sep | Promo | 834.98 |
# Holiday impact on customers by month¶
# Holiday impact on customers by month
analyze_stateholiday_impact(df_features, 'customers', 'month')
State Holiday Impact Analysis - Customers by Month: ====================================================================== Overall Holiday Impact: ------------------------- Public : 1,279 (+67.6% vs regular) Easter : 1,687 (+121.1% vs regular) Christmas : 1,569 (+105.7% vs regular) Regular Days: € 763 (baseline) Holiday Impact by Month: ----------------------------------- Apr : Regular 773 | Holiday 1,619 | Impact +109.4% Aug : Regular 750 | Holiday 755 | Impact +0.6% Dec : Regular 886 | Holiday 1,569 | Impact +77.2% Feb : Regular 732 | No holiday data Jan : Regular 722 | Holiday 1,249 | Impact +73.0% Jul : Regular 761 | No holiday data Jun : Regular 758 | Holiday 1,096 | Impact +44.5% Mar : Regular 759 | Holiday 2,236 | Impact +194.6% May : Regular 777 | Holiday 1,418 | Impact +82.6% Nov : Regular 782 | Holiday 2,578 | Impact +229.5% Oct : Regular 752 | Holiday 1,212 | Impact +61.3% Sep : Regular 746 | No holiday data Store Operations Impact: ------------------------- Total store closures: 168,492 (17.1%) Holiday closures: 30,140 Regular closures: 138,352
| month | stateholiday | customers | |
|---|---|---|---|
| 0 | Apr | Easter | 1618.79 |
| 1 | Apr | Normal Day | 772.97 |
| 2 | Aug | Normal Day | 750.41 |
| 3 | Aug | Public | 754.88 |
| 4 | Dec | Christmas | 1569.23 |
| 5 | Dec | Normal Day | 885.67 |
| 6 | Feb | Normal Day | 731.54 |
| 7 | Jan | Normal Day | 722.21 |
| 8 | Jan | Public | 1249.49 |
| 9 | Jul | Normal Day | 761.38 |
| 10 | Jun | Normal Day | 758.48 |
| 11 | Jun | Public | 1095.97 |
| 12 | Mar | Easter | 2235.94 |
| 13 | Mar | Normal Day | 758.85 |
| 14 | May | Normal Day | 776.91 |
| 15 | May | Public | 1418.38 |
| 16 | Nov | Normal Day | 782.22 |
| 17 | Nov | Public | 2577.62 |
| 18 | Oct | Normal Day | 751.84 |
| 19 | Oct | Public | 1212.47 |
| 20 | Sep | Normal Day | 745.74 |
print("✅ Data Visualization Impact Analysis completed.\n")
✅ Data Visualization Impact Analysis completed.
print("✅ Features Engineering and Data Visualization (I) completed successfully!")
print(f"🗓️ Analysis Date: {bold_start}{pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
✅ Features Engineering and Data Visualization (I) completed successfully!
🗓️ Analysis Date: 2025-08-13 21:23:12
🌟 Advantages¶
Reusable across 'year', 'month', 'dayofweek', etc.
Easy to change aggregation type ('sum', 'median', etc.)
Consistent naming and sorting
Makes your code far more modular for dashboards or reporting
Why Reusability Matters¶
- 💡 Scalability: You can plug your functions into larger pipelines or production environments without rewrites.
- 🛠️ Maintainability: A bug fix in one utility can instantly improve multiple workflows.
- 🚀 Efficiency: Spend less time rewriting logic and more time interpreting results.
Why This Matters for Rossmann Store Sales¶
- We’ll likely repeat the same aggregations or visualizations across hundreds of stores.
- Promos, holidays, and weekday patterns demand consistent filtering and analysis.
- Modular functions help you prototype insights fast, scale across stores, and iterate smoothly.
# End analysis
viz_impact_analysis_end = pd.Timestamp.now()
duration = viz_impact_analysis_end - viz_impact_analysis_begin
# Final summary print
print("\n📋 Features Engineering && Data Viz Summary")
print(f"🟢 Begin Date: {bold_start}{viz_impact_analysis_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"✅ End Date: {bold_start}{viz_impact_analysis_end.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"⏱️ Duration: {bold_start}{str(duration)}{bold_end}")
📋 Features Engineering && Data Viz Summary 🟢 Begin Date: 2025-08-13 21:23:00 ✅ End Date: 2025-08-13 21:23:12 ⏱️ Duration: 0 days 00:00:12.616810
Project Design Rationale: Notebook Separation¶
To promote clarity, maintainability, and scalability within the project, data engineering and visualization tasks are intentionally separated into distinct notebooks. This modular approach prevents the accumulation of excessive code in a single notebook, making it easier to debug, update, and collaborate across different stages of the workflow. By isolating data transformation logic from visual analysis, each notebook remains focused and purpose-driven, ultimately enhancing the overall efficiency and readability of the project.